New subject: back-mdb optimization

13 Sep 2011


      A bit of a summary of how the backend is shaping up. I've been testing with a 
variety of synthetic LDIFs as well as an actual application database (Zimbra 
accounts).
I noted before that back-mdb's write speeds on disk are quite slow. This is 
because a lot of its writes will be to random disk pages, and also the data 
writes in a transaction commit are followed by a meta page write, which always 
involves a seek to page 0 or page 1 of the DB file. For slapadd -q this effect 
can be somewhat hidden because the writes are done with MDB_NOSYNC specified, 
so no explicit flushes are performed. In my current tests with synchronous 
writes, back-mdb is one half the speed of back-bdb/hdb.
(Even in fully synchronous mode, BDB only writes its transaction logs 
synchronously, and those are always sequential writes so there's no seek 
overhead to deal with.)
With that said, slapadd -q for a 3.2M entry database on a tmpfs:
back-hdb:	real 75m32.678s  user 84m31.733s   sys 1m0.316s
back-mdb:	real 63m51.048s  user 50m23.125s   sys 13m27.958s
For back-hdb, BDB was configured with a 32GB environment cache. The resulting 
DB directory consumed 14951004KB including data files and environment files.
For back-mdb, MDB was configured with a 32GB mapsize. The resulting DB 
directory consumed 18299832KB. The input LDIF was 2.7GB, and there were 29 
attributes indexed. Currently MDB is somewhat wasteful with space when dealing 
with the sorted-duplicate databases that are used for indexing, there's 
definitely room for improvement here.
Also this slapadd was done with tool-threads set to 1, because back-mdb only 
allows one writer at a time anyway. There is also obviously room for 
improvement here, in terms of a bulk-loading API for the MDB library.
With the DB loaded, the time to execute a search that scans every entry in the 
DB was performed against each server.
Initially back-hdb was only configured with a cachesize of 10000 and 
IDLcachesize of 10000. It was tested again using a cachesize of 5,000,000 
(which is more than was needed since the DB only contained 3,200,100 entries). 
In each configuration a search was performed twice - once to measure the time 
to go from an empty cache to a fully primed cache, and again to measure the 
time for the fully cached search.
first		second		slapd size
back-hdb, 10K cache	3m6.906s	1m39.835s	7.3GB
back-hdb, 5M cache	3m12.596s	0m10.984s	46.8GB
back-mdb		0m19.420s	0m16.625s	7.0GB
Next, the time to execute multiple instances of this search was measured, 
using 2, 4, 8, and 16 ldapsearch instances running concurrently.
    			average result time
    	2		4		8		16
back-hdb, 5M	0m14.147s	0m17.384s	0m45.665s	17m15.114s
back-mdb	0m16.701s	0m16.688s	0m16.621	0m16.955s
I don't recall doing this test against back-hdb on ada.openldap.org before, 
certainly the total blowup at 16 searches was unexpected. But as you can see, 
with no read locks in back-mdb, search performance is pretty much independent 
of load. At 16 threads back-mdb slowed down measurably, but that's 
understandable given that the rest of the system still needed CPU cycles here 
and there. Otherwise, slapd was running at 1600% CPU the entire time. For 
back-hdb, slapd maxed out at 1467% CPU, the lock overhead drove it into the 
ground.
So far I'm pretty pleased with the results; for the most part back-mdb is 
delivering on what I expected. Decoding each entry every time is a bit of a 
slowdown, compared to having entries fully cached. But the cost disappears as 
soon as you get more than a couple requests running at once.
Overall I believe it proves the basic philosophy - in this day and age, it's a 
waste of application developers' time to incorporate a caching layer into 
their own code. The OS already does it and does it well. Give yourself as 
transparent a path as possible between RAM and disk using mmap, and don't fuss 
with it any further.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

back-mdb status