Hi,
I would like to bring up the problem with back-mdb and alias-dereferening again. There is Bug ITS#7657 for this already, and in http://www.openldap.org/lists/openldap-technical/201504/msg00008.html the problem was also addressed. Probably in other places as well, but I did not find anything else in the archives.
I would like to reconfigure all my LDAP clusters to use back-mdb, however, I can't: my LDAP tree has about 2 Mio entries, 600000 of which are alias entries.
I observe that when using back-mdb and making search request with search scope "sub" and deref=always, than this search request will take seconds before a response... in contrast, with deref=never, the same search request takes only millisecs. It does not matter whether there are aliases in the search scope or not... with deref=always it always takes seconds, each request keeping one CPU thread very busy.
So if many requests come in with sub/always (which unfortunately happens a lot because many LDAP clients just set deref=always as a default) it is a sure way to bring down an LDAP cluster configured with back-mdb.
The problem has haunted me for a while (incidents, incidents :-), and I have tried to find out where it comes from. I would like to share my conjectures here.
A search request with sub/always eventually ends up in the function search_aliases(..), located in back-mdb/search.c (code/linenumbers is from v2.4.46 ):
157 /* Find all aliases in database */ 158 MDB_IDL_ZERO( aliases ); 159 rs->sr_err = mdb_filter_candidates( op, isc->mt, &af, aliases, 160 curscop, visited ); 161 if (rs->sr_err != LDAP_SUCCESS || MDB_IDL_IS_ZERO( aliases )) { 162 return rs->sr_err; 163 } 164 oldsubs[0] = 1; 165 oldsubs[1] = e_id; 166 167 MDB_IDL_ZERO( visited ); 168 MDB_IDL_ZERO( newsubs ); 169 170 cursoro = 0; 171 ido = mdb_idl_first( oldsubs, &cursoro ); 172 173 for (;;) { 174 /* Set curscop to only the aliases in the current scope. Start with 175 * all the aliases, then get the intersection with the scope. 176 */ 177 rs->sr_err = mdb_idscope( op, isc->mt, e_id, aliases, curscop );
After adding a bit of extra debug code, I could see in the logs that the time is actually spent in line 177, i.e. the mdb_idscope function.
What I believe is happening is the following: in line 159 the index list of all aliases in the DB is constructed and put into the "aliases" list. The list comes from the objectClass index, but since there are so many alias entries, the index contains not an explicit list, but a range... a range that covers basically *all* 2 Mio entries in the DB. (this happens also with the hdb backend, so the fact that "aliases" is a range is perhaps part, but not cause of the problem).
Then the for loop is entered in 173, and mdb_idscope in 177 is called. The idea there is to compute the intersection of alias list "aliases" and search scope, i.e. get all aliases that are in the actual search scope.
This is done by walking through the 2 millions indexes represented by the range in "aliases", and deciding whether the id is in the scope or not. This, if I understand correctly, by walking up the LDAP tree from leaf towards root (actually walking up the structure of the dn2id DB) ... if we encounter the search base on the way, the alias is part of the search scope, if we reach the root, then not.
Computing this intersection is what costs time, and it does not depend on whether the intersection is empty or not.
Apparently it is done differently in back_hdb and by extension, back_bdb. I have not analysed that code there in detail but I believe the intersection is computed by actually looking at 2 different ID lists, one containing the aliases (range) again, the other one the actual IDs of the search scope. Which, it appears, is more efficient.
So the problem seems to be to compute the relevant aliases to start expanding from. The way it is done now the complexity seems to be linear in the number of aliases, if not number of entries on the LDAP tree.
For my own use cases at least it would be much more efficient to just do a tree traversal of the search scope, pick up the relevant aliases and put them in the list. I believe even for large trees this would work quite well... if I search my whole tree for entries with objectClass=alias (deref=never, of course), this is amazingly fast. I am currently looking into how something like that could be implemented, but I need to dive much deeper in the code before I could come up with something worth considering.
What are the views of the developers on this? Does the analysis sound plausible? Could a fix be easily obtained?
I'm willing to try out any patch thrown my way. Somewhere is written that mdb will replace hdb/bdb eventually ... I believe this problem needs a fix before that can happen.
Henrik