back-mdb was first feature-complete a week ago. Between then and now I've spent a bit of time profiling its behavior and eliminating hot spots. For this work I used a test database of 250,000 synthetic entries, running under valgrind's callgrind tool and my FunctionCheck profiler. (Occasionally the two tools would disagree on where hot spots were, so I wound up targeting both.) The callgrind output is available on http://highlandsun.com/hyc/mdb_search/ for reference.
With the initial back-mdb code, which was basically back-bdb with all of its caching logic removed, an ldapsearch that scanned the entire DB ran in 8,875,046,744 instructions. The biggest hot spot was memnrcmp(), which is the libmdb function for comparing two strings in reverse byte order. The top 10 functions were:
1,828,659,111 memnrcmp 1,151,097,812 strncasecmp 911,117,166 mdb_search_node 628,388,775 avl_find 612,549,560 ad_keystring 600,502,442 entry_decode 494,045,070 slap_bv2ad 376,177,636 mdb_search_page 285,882,273 attr_index_name_cmp 199,751,242 mdb_cursor_set
(That basically corresponded to commit e5b1dce6a7904e0eb31029959865730fc813ce57)
-=-=-=-
The next step was to eliminate slap_bv2ad() from the entry_decode() path, using numeric IDs for attributeDescriptions in the database. Rewriting slapd entry_decode as mdb_entry_decode() with this change brought the total search execution down to 5,410,551,759 instructions. The top 10 functions were:
1,823,974,563 memnrcmp 796,099,511 mdb_search_node 528,502,136 mdb_entry_decode 376,251,165 mdb_search_page 199,751,329 mdb_cursor_set 167,097,364 strncasecmp 151,524,069 attr_clean 143,741,422 mdb_get_page 87,494,764 cursor_push_page 83,500,204 is_ad_subtype (commit 1e32fcf099ba8c15333365fe68aefa5217ae3d8c)
-=-=-=-
Next was to eliminate some redundant navigation of the dn2id index. It was doing essentially the same traversal twice on each search candidate - once to determine if the candidate belonged to the search scope, and once to assemble the entryDN. With this change the total search came down to 3,767,565,531 instructions. The top functions were:
1,018,812,205 memnrcmp 528,501,424 mdb_entry_decode 463,442,142 mdb_search_node 212,601,386 mdb_search_page 167,097,240 strncasecmp 151,523,868 attr_clean 93,001,026 mdb_cursor_set 83,500,204 is_ad_subtype 80,889,277 avl_find 80,496,022 mdb_get_page (commit 6c8e4f2671b6aed41cd5098725048dbe2513612c)
-=-=-=-
The next step was a minor libmdb cleanup, restructuring it so that key/data pairs were always guaranteed to start on a 2-byte aligned address. (While x86 didn't seem to care, CPUs like SPARC would SIGBUS otherwise.) This restructuring brought execution down to 3,441,377,693 instructions - making code more portable is always a good thing, even if the impact is minor. The top functions were
1,018,873,702 memnrcmp 537,251,494 mdb_entry_decode 463,465,251 mdb_search_node 212,597,818 mdb_search_page 151,523,868 attr_clean 93,001,026 mdb_cursor_set 83,500,204 is_ad_subtype 80,496,022 mdb_get_page 63,750,282 attrs_alloc 62,500,669 mdb_search (commit 293df78b2be77d6d153fd7052cc62d3377dc5501)
-=-=-=-
Next, finally start doing something about memnrcmp. First is simply writing a more integer-oriented function cintcmp, which operates on unaligned integers a byte at a time. This had only a small impact as well, getting us down only a bit to 3,342,205,373 instructions.
919,700,412 cintcmp (commit f9c8796d0b3ed9bc0f51c76bb28609121b1e2eec) The rest of the trace profile is basically identical to the previous one.
-=-=-=-
Next was a bit of libmdb code cleanup and restructuring. The performance change was minimal, bringing total execution down to 3,310,510,255 instructions. The trace profile is mostly the same as the previous. commit dac3fae3b540841ae753bea16f3b353e2124c43d)
-=-=-=-
Next was a further speedup of cintcmp, changing it to operate on unsigned shorts instead of just chars, now that we had guaranteed 2-byte alignment. This brought execution down to 2,832,817,987 instructions, and shook up the profile a little bit:
537,251,494 mdb_entry_decode 475,922,450 cintcmp 457,377,978 mdb_search_node 176,828,204 mdb_search_page_root 151,523,868 attr_clean 83,500,204 is_ad_subtype (commit 1b69295a48cca409ed0c2f3fe655325e00f55ce2)
-=-=-=-
Next was a further rewrite of mdb_entry_decode, using tmpmem allocs instead of the slapd central entry_alloc/attrs_alloc functions. This brought execution down to 2,483,077,294 instructions. The profile is much like the previous, but attr_clean and its associated functions disappear. The top functions were:
535,751,482 mdb_entry_decode 475,922,450 cintcmp 457,377,978 mdb_search_node 176,828,204 mdb_search_page_root 83,500,204 is_ad_subtype (commit f72d65b77aa6cd4439ee0ad80b498f4ed707a278)
-=-=-=-
Next was another tweak for mdb_search(), keeping the cursor on the id2entry database for the duration of the search. This eliminated a lot of mdb_search_page overhead since usually the cursor was already on the right page when the next entry was being fetched. This change brought execution down to 2,241,832,009 instructions. The top functions were:
535,751,482 mdb_entry_decode 391,064,166 cintcmp 350,350,674 mdb_search_node 139,905,148 mdb_search_page_root 93,256,473 mdb_cursor_set 83,500,204 is_ad_subtype (commit 54ced52c047425b432075946dd2997c52f020de0) -=-=-=-
Finally (as of this morning) a bit of cleanup and restructuring in libmdb, to eliminate a bunch of cruft in the previous data structure layout. This change was more for esthetic reasons than for performance, but it still offered a small gain, with execution at 2,232,560,312 instructions. The top functions:
535,751,482 mdb_entry_decode 391,064,166 cintcmp 342,681,498 mdb_search_node 129,900,173 mdb_search_page_root 90,006,339 mdb_cursor_set 83,500,204 is_ad_subtype (commit 25529a4c36903d0456b1251712de32f665850029)
-=-=-=-
At the outset libmdb had its clumsy parts. Now the libmdb/mdb.o text+data is only 31255 bytes - it's tight and very efficient. The entire DB engine can execute entirely within a CPU's L1 instruction cache, with room to spare.