Re: back-mdb status

15 Sep 2011


      Howard Chu wrote:
...
A couple new results for back-mdb as of today.
...
	first		second		slapd size

back-hdb, 10K cache	3m6.906s	1m39.835s	7.3GB
back-hdb, 5M cache	3m12.596s	0m10.984s	46.8GB
back-mdb		0m19.420s	0m16.625s	7.0GB
back-mdb		0m15.041s	0m12.356s	7.8GB
...
Next, the time to execute multiple instances of this search was measured,
using 2, 4, 8, and 16 ldapsearch instances running concurrently.
   			average result time
   	2		4		8		16
back-hdb, 5M	0m14.147s	0m17.384s	0m45.665s	17m15.114s
back-mdb	0m16.701s	0m16.688s	0m16.621	0m16.955s
back-mdb	0m12.009s	0m11.978s	0m12.048s	0m12.506s
This result for back-hdb just didn't make any sense. Going back, I discovered 
that I'd made a newbie mistake - my slapd was using the libdb-4.7.so that 
Debian bundled, instead of the one I had built in /usr/local/lib. Apparently 
my LD_LIBRARY_PATH setting that I usually have in my .profile was commented 
out when I was working on some other stuff.
While loading a 5 million entry DB for SLAMD testing, I went back and 
rechecked these results and got much more reasonable numbers for hdb. Most 
likely the main difference is that Debian builds BDB with its default 
configuration for mutexes, which is a hybrid that begins with a spinlock and 
eventually falls back to a pthread mutex. Spinlocks are nice and fast, but 
only for a small number of processors. Since they use atomic instructions that 
are meant to lock the memory bus, the coherency traffic they generate is quite 
heavy, and it increases geometrically with the number of processors involved.
I always build BDB with an explicit --with-mutex=POSIX/pthreads to avoid the 
spinlock code. Linux futexes are decently fast, and scale much better as the 
number of processors goes up.
With slapd linked against my build of BDB 4.7, and using the 5 million entry 
database instead of the 3.2M entry database I used before, the numbers make 
much more sense.
slapadd -q times
back-hdb	real 66m09.831s  user 115m52.374s  sys 5m15.860s
back-mdb	real 29m33.212s  user  22m21.264s  sys 7m11.851s
ldapsearch scanning the entire DB
    	first		second		slapd size	DB size
back-hdb, 5M	4m15.395s	0m16.204s	26GB		15.6GB
back-mdb	0m14.725s	0m10.807s	10GB		12.8GB
multiple concurrent scans
    			average result time
    	2		4		8		16
back-hdb, 5M	0m24.617s	0m32.171s	1m04.817s	3m04.464s
back-mdb	0m10.789s	0m10.842s	0m10.931s	0m12.023s
You can see that up to 4 searches, probably the BDB spinlock would have been 
faster. Above that, you need to get rid of the spinlocks. If I had realized I 
was using the default BDB build I could of course have configured the BDB 
environment with set_tas_spins in the DB_CONFIG file. We used to always set 
this to 1, overriding the BDB default of (50 * number of CPUs), before we just 
decided to omit them entirely at configure time.
But I think this also illustrates another facet of MDB - reducing config 
complexity, so there's a much smaller range of mistakes that can be made.
re: the slapd process size - in my original test I configured a 32GB BDB 
environment cache. This particular LDIF only needed an 8GB cache, so that's 
the size I used this time around. The 47GB size reported was the Virtual size 
of the process, not the Resident size. That was also a mistake, all of the 
other numbers are Resident size.
When contemplating the design of MDB I had originally estimated that we could 
save somewhere around 2-3x the RAM compared to BDB. With slapd running 2.7x 
larger with BDB than MDB on this test DB, that estimate has been proved to be 
correct.
The MVCC approach has also proved its value, with no bottlenecks for readers 
and response time essentially flat as number of CPUs scales up.
There are still problem areas that need work, but it looks like we're on the 
right track, and what started out as possibly-a-good-idea is delivering on the 
promise.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: back-mdb status