Full_Name: Howard Chu Version: HEAD/RE24 OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (220.127.116.11) Submitted by: hyc
See this thread for background http://www.openldap.org/lists/openldap-technical/201011/msg00146.html
Currently when returning the entries in an IDL range, back-bdb search.c just iterates through all the values in the range. If a lot of entries have been deleted within this range a lot of time is wasted calling into BDB to fetch entryIDs that don't exist. Strictly speaking this performance loss can be avoided by simply using a cursor on the id2entry DB, and using DB_NEXT to fetch the next entry that exists.
However, while that approach provides uniform search performance, it's also always about 2x slower than the best-case where we just iterate through the IDL. Also if we keep a cursor like this open for the entire search operation, it will leave all of those entries read-locked until the search completes. (Note that an MVCC approach, like for back-mdb, would not have this latter problem.)
The current patch continues to use the IDL iterator, but if the next iterated ID comes up as nonexistent, then we check for the next ID in the DB. We use the same approach as the online indexer to avoid taking any long-term locks: create a cursor, use DB_SET_RANGE to get the next ID, and then close the cursor, thus freeing its locks. This way the ideal case (all consecutive entries exist) remains fast, and the performance hit of asking BDB is only incurred when necessary. Testing shows that search performance remains uniform and within 1% of the ideal case.