On Mon, Jul 15, 2019 at 07:44:05PM +0100, Howard Chu wrote:
Howard Chu wrote:
Henrik Bohnenkamp wrote:
On Mon, Jul 15, 2019 at 02:26:59PM +0100, Howard Chu wrote:
Fyi, on our problematic test database with 11M entries and 3.7M aliases, a search with -a always , starting from the DB suffix, took 4 minutes without this patch, and 1235 minutes with this patch.
Needless to say, that's not looking good. Still checking other test cases.
I start to understand why this is the case. I believe that our two approaches to solve the problem each have their Nemesis... I have constructed a test case where the patched slapd needs 2 minutes (1 minute on a repeat) for a search for people objects, but the master branch has yet to finish after 30 minutes (and might not finish today).
The test case is modeled similar to what is done in my company: group objects have not only member entries for people, but each member has an alias to the corresponding person object, these aliases being children to the respective group object. Furthermore, subgroups/group inclusion is modeled by adding aliases (also as children) to the group object, pointing to the group objects representing the included groups. So there is some recursive descent to get all person objects of all subgroups. In total I have 5 mio entries, 3 mio of which are aliases. There are no cycles, but in this test case the depth of the recursion is absolutely unrealistic.
I believe, in the master-slapd, each step down in the recursion needs a call to mdb_idscope, which in turn goes through all 3 mio aliases to compute the intersection with the scope of the group object (that's what I remember, but I have not looked at the algorithm for a year now).
The patch traverses in this case just very small sub-trees (although many of those) to collect the relevant aliases and continues from there. That seems to be much more efficient here.
If I understand your description above correctly, my patch sucks with your test case because it does a full traversal of the LDAP tree to collect all aliases ... with 11 mio entries that takes a while. The master branch, however, just computes intersection with the scope once and continues from there, which must be much faster.
I actually do not know what to conclude from all this. I wonder however, if a combination of both approaches could be beneficial. With the patch the diskNode of an entry contains not only the number of children in the scope, but also the number of aliases. If the total number of aliases in the database is large, but the search scope is small, the tree traversal might be faster. If the number of entries in the scope is big, and there are aliases in the search scope, the intersection in mdb_idscope might be more efficient. If there are no aliases in the search scope, nothing has to be done at all.