Re: py-lmdb

18 May 2014


      Luke Kenneth Casson Leighton wrote:
...
...
We fell for the fantasy of parallel writes with BerkeleyDB, but after a
dozen+ years of poking, profiling, and benchmarking, it all becomes clear -
all of that locking overhead+deadlock detection/recovery is just a waste of
resources.
... which is why tdb went to the other extreme, to show it could be done.
But even tdb only allows one write transaction at a time. I looked into 
writing a back-tdb for OpenLDAP back in 2009, before I started writing LMDB. I 
know pretty well how tdb works...
...
...
https://twitter.com/hyc_symas/status/451763166985613312
quote:
"The new code is faster at indexing and searching, but not so much
faster it would blow you away, even using
LMDB. Turns out the slowness of Python looping trumps the speed of a
fast datastore :(. The difference
might be bigger on a big index; I'm going to run experiments on the
Enron dataset and see."
interesting.  so why is read up at 5,000,000 per second under python
(in a python loop, obviously) but write isn't?  something odd there.
Good question. I'd guess there's some memory allocation overhead involved in 
writes. The Whoosh guys have some more perf stats here
https://bitbucket.org/mchaput/whoosh/wiki/Whoosh3
(their test.Tokyo / All Keys result is highly suspect though, the timing is 
the same for 100,000 keys as for 1M keys. Probably a bug in their test code.)
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: py-lmdb