(ITS#6961) back-bdb/hdb search performance issue - openldap-bugs

1 Jun 2011


      Full_Name: Howard Chu
Version: HEAD/RE24
OS: 
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (76.94.188.8)
Submitted by: hyc
See this thread for background
http://www.openldap.org/lists/openldap-technical/201011/msg00146.html
Currently when returning the entries in an IDL range, back-bdb search.c just
iterates through all the values in the range. If a lot of entries have been
deleted within this range a lot of time is wasted calling into BDB to fetch
entryIDs that don't exist. Strictly speaking this performance loss can be
avoided by simply using a cursor on the id2entry DB, and using DB_NEXT to fetch
the next entry that exists.
However, while that approach provides uniform search performance, it's also
always about 2x slower than the best-case where we just iterate through the IDL.
Also if we keep a cursor like this open for the entire search operation, it will
leave all of those entries read-locked until the search completes. (Note that an
MVCC approach, like for back-mdb, would not have this latter problem.)
The current patch continues to use the IDL iterator, but if the next iterated ID
comes up as nonexistent, then we check for the next ID in the DB. We use the
same approach as the online indexer to avoid taking any long-term locks: create
a cursor, use DB_SET_RANGE to get the next ID, and then close the cursor, thus
freeing its locks. This way the ideal case (all consecutive entries exist)
remains fast, and the performance hit of asking BDB is only incurred when
necessary. Testing shows that search performance remains uniform and within 1%
of the ideal case.