Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a dozen+ years of poking, profiling, and benchmarking, it all becomes clear - all of that locking overhead+deadlock detection/recovery is just a waste of resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...
quote:
"The new code is faster at indexing and searching, but not so much faster it would blow you away, even using LMDB. Turns out the slowness of Python looping trumps the speed of a fast datastore :(. The difference might be bigger on a big index; I'm going to run experiments on the Enron dataset and see."
interesting. so why is read up at 5,000,000 per second under python (in a python loop, obviously) but write isn't? something odd there.
Good question. I'd guess there's some memory allocation overhead involved in writes. The Whoosh guys have some more perf stats here
https://bitbucket.org/mchaput/whoosh/wiki/Whoosh3
(their test.Tokyo / All Keys result is highly suspect though, the timing is the same for 100,000 keys as for 1M keys. Probably a bug in their test code.)
On Sun, May 18, 2014 at 12:15:45PM -0700, Howard Chu wrote:
Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a dozen+ years of poking, profiling, and benchmarking, it all becomes clear - all of that locking overhead+deadlock detection/recovery is just a waste of resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...
Okay, transactioned, safe writes are slow. True. But the non-transactioned ones have significantly improved in the very recent past. We get a lot by mutexes (we had to find out how badly the linux fcntl locks really suck ...), and also by spreading the load from the freelist to the dead records in neighboring hash chains. I don't have any microbenchmarks, but larger-scale benchmark benefit a lot from those two.
I would like to give lmdb a try in Samba, really. I see that for 32-bit systems we will probably still need tdb for the future (pread/pwrite in lmdb anyone in the meantime? :-)). The other blocker when I last took a serious look is that crashed processes can have harmful effects. Has this changed in the meantime with automatic cleanup and/or robust mutexes? I know those might be a bit slower, but I would love to offer our users the choice at least.
Volker
Volker Lendecke wrote:
On Sun, May 18, 2014 at 12:15:45PM -0700, Howard Chu wrote:
But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...
Okay, transactioned, safe writes are slow. True. But the non-transactioned ones have significantly improved in the very recent past. We get a lot by mutexes (we had to find out how badly the linux fcntl locks really suck ...), and also by spreading the load from the freelist to the dead records in neighboring hash chains. I don't have any microbenchmarks, but larger-scale benchmark benefit a lot from those two.
(And of course, fcntl only works for inter-process locking. We needed thread support, which also required mutexes.)
I would like to give lmdb a try in Samba, really. I see that for 32-bit systems we will probably still need tdb for the future (pread/pwrite in lmdb anyone in the meantime? :-)).
That will require app-level buffer cache mgmt and lots of memcpys. Kinda defeats the design of LMDB.
The other blocker when I last took a serious look is that crashed processes can have harmful effects. Has this changed in the meantime with automatic cleanup and/or robust mutexes? I know those might be a bit slower, but I would love to offer our users the choice at least.
Hallvard has a test branch with robust mutex support, we need to look into merging it...
On Sun, May 18, 2014 at 01:01:26PM -0700, Howard Chu wrote:
I would like to give lmdb a try in Samba, really. I see that for 32-bit systems we will probably still need tdb for the future (pread/pwrite in lmdb anyone in the meantime? :-)).
That will require app-level buffer cache mgmt and lots of memcpys. Kinda defeats the design of LMDB.
Yes, I know. I don't advocate this in the default case. The normal mode if the db fits into memory I would not change anything. However it might be possible with clever inline functions to add an optional slower layer for the small boxes with zero cost for the mmap case. Small NAS boxes are really important for us.
If I gave such a thing a try, and assuming I get anywhere, would you consider taking a look? I'm not sure I will spend much time on this soon, but I would like to know whether that's a doomed attempt right from the start. lmdb is just too cool to ignore :-)
Volker
Volker Lendecke wrote:
On Sun, May 18, 2014 at 01:01:26PM -0700, Howard Chu wrote:
I would like to give lmdb a try in Samba, really. I see that for 32-bit systems we will probably still need tdb for the future (pread/pwrite in lmdb anyone in the meantime? :-)).
That will require app-level buffer cache mgmt and lots of memcpys. Kinda defeats the design of LMDB.
Yes, I know. I don't advocate this in the default case. The normal mode if the db fits into memory I would not change anything. However it might be possible with clever inline functions to add an optional slower layer for the small boxes with zero cost for the mmap case. Small NAS boxes are really important for us.
If I gave such a thing a try, and assuming I get anywhere, would you consider taking a look? I'm not sure I will spend much time on this soon, but I would like to know whether that's a doomed attempt right from the start. lmdb is just too cool to ignore :-)
Hmmm. I suppose it's a matter of timing. 64bit ARM chips are hitting the market now. Frankly I view this as a problem that will solve itself, and not worth lifting a finger over.
On Sun, May 18, 2014 at 03:51:38PM -0700, Howard Chu wrote:
Hmmm. I suppose it's a matter of timing. 64bit ARM chips are hitting the market now. Frankly I view this as a problem that will solve itself, and not worth lifting a finger over.
Just found via lwn.net:
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2014-May/000335....
:-)
Volker
Volker Lendecke wrote:
On Sun, May 18, 2014 at 03:51:38PM -0700, Howard Chu wrote:
Hmmm. I suppose it's a matter of timing. 64bit ARM chips are hitting the market now. Frankly I view this as a problem that will solve itself, and not worth lifting a finger over.
Just found via lwn.net:
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2014-May/000335....
Interesting thread, but mostly it reiterates that no 32 bit systems will be viable past 2038. So there's a hard limit of 24 years remaining for these machines.
There may well be a plethora of 32 bit microcontrollers still running strong by then, but how many of them will be capable of managing more than 4GB of data? Do you really need a 16GB database in your smart toaster, smart refrigerator, or whatever networked appliance in the Internet of Things?
On Thu, May 22, 2014 at 01:33:10AM -0700, Howard Chu wrote:
Volker Lendecke wrote:
On Sun, May 18, 2014 at 03:51:38PM -0700, Howard Chu wrote:
Hmmm. I suppose it's a matter of timing. 64bit ARM chips are hitting the market now. Frankly I view this as a problem that will solve itself, and not worth lifting a finger over.
Just found via lwn.net:
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2014-May/000335....
Interesting thread, but mostly it reiterates that no 32 bit systems will be viable past 2038. So there's a hard limit of 24 years remaining for these machines.
There may well be a plethora of 32 bit microcontrollers still running strong by then, but how many of them will be capable of managing more than 4GB of data? Do you really need a 16GB database in your smart toaster, smart refrigerator, or whatever networked appliance in the Internet of Things?
You never know. All I know is that Samba will need the 32-bit option for quite some time to come. If we started full steam lmdb, this meant that we need to maintain two database engines for the same purpose. I'd like to avoid this if at all possible.
Volker
Volker Lendecke wrote:
On Thu, May 22, 2014 at 01:33:10AM -0700, Howard Chu wrote:
Volker Lendecke wrote:
On Sun, May 18, 2014 at 03:51:38PM -0700, Howard Chu wrote:
Hmmm. I suppose it's a matter of timing. 64bit ARM chips are hitting the market now. Frankly I view this as a problem that will solve itself, and not worth lifting a finger over.
Just found via lwn.net:
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2014-May/000335....
Interesting thread, but mostly it reiterates that no 32 bit systems will be viable past 2038. So there's a hard limit of 24 years remaining for these machines.
There may well be a plethora of 32 bit microcontrollers still running strong by then, but how many of them will be capable of managing more than 4GB of data? Do you really need a 16GB database in your smart toaster, smart refrigerator, or whatever networked appliance in the Internet of Things?
You never know. All I know is that Samba will need the 32-bit option for quite some time to come. If we started full steam lmdb, this meant that we need to maintain two database engines for the same purpose.
Only if you really expect 32 bit micro servers to manage more than 2-3GB of data. That doesn't seem realistic to me. Of course, I have only a vague feel for what the data is we're talking about. AFAICS this is filesystem metadata, which must be of much smaller volume than any actual file data being managed by Samba.
I'd like to avoid this if at all possible.
Fundamentally, this is unavoidable. Even if we cram the desired functionality into LMDB, it would end up being a different database engine, just hiding under the same name. The code paths would be completely different from the 64 bit code.