Luke Kenneth Casson Leighton wrote:
We fell for the fantasy of parallel writes with BerkeleyDB, but after a dozen+ years of poking, profiling, and benchmarking, it all becomes clear - all of that locking overhead+deadlock detection/recovery is just a waste of resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I know pretty well how tdb works...
quote:
"The new code is faster at indexing and searching, but not so much faster it would blow you away, even using LMDB. Turns out the slowness of Python looping trumps the speed of a fast datastore :(. The difference might be bigger on a big index; I'm going to run experiments on the Enron dataset and see."
interesting. so why is read up at 5,000,000 per second under python (in a python loop, obviously) but write isn't? something odd there.
Good question. I'd guess there's some memory allocation overhead involved in writes. The Whoosh guys have some more perf stats here
https://bitbucket.org/mchaput/whoosh/wiki/Whoosh3
(their test.Tokyo / All Keys result is highly suspect though, the timing is the same for 100,000 keys as for 1M keys. Probably a bug in their test code.)