mån 2007-11-12 klockan 15:15 -0800 skrev Quanah Gibson-Mount:
--On Monday, November 12, 2007 7:02 AM +0000 hyc@symas.com wrote:
This isn't a lot of information to go on. If you can create a test program that shows the problem occurring, using dummy data, that would help. --
Also, Just some general data on what it is you are doing that is a bit more explanative.
I have now done several days of testing and think I have tracked what is wrong. All my tests have been done in 2.3.38.
What my program going to do is: move one person from one branch to an other one. It does the following (in a simplified way): 1) search for the person entry using base o=xxx and filter uid=yyyy 2) does a modrdn from cn=qqq+uid=yyy,a=aaaa,b=bbbb,c=cccc,o=xxx to cn=qqq+uid=yyy,d=dddd,e=eeee,c=cccc,o=xxx
I do this now on a newly started ldap server (that is cache have not been filled). This is a special case that I found triggered what is probably the bug I got previously but then the server may have been running for a long time.
My analysis from all my debug prints indicate that: during 1) above the person entry is located and hdb_cache_find_parent is called which calls hdb_dn2id_parent to find the way to the root. From what I can see this constructs cache entries with one kid entry in bei_kids. It does not load all kids of each entry found along the path to root.
Next in 2) modrdn is called and bdb_cache_modrdn. This removes the person entry from the a=aaaa entry. As the a=aaaa entry was cached during 1) with just setting one child instead of loading all from disk, that entry now has no children (bei_kids is NULL) so the state is set to CACHE_ENTRY_NO_KIDS.
If I after this does a search with base c=cccc,o=xxx the hdb_dn2idl routine will not find all entries as the cached entry of a=aaaa,b=bbbb,c=cccc,o=xxx in hdb_dn2idl_internal has state CACHE_ENTRY_NO_KIDS and is ignored. If modrdn is step 2) before deleting the entry from its parents list of kids, had loaded all kids from disk, is should have worked.
So the problem is, it my analysis is correct, is that sometimes cache entries are created which have not loaded the children from disk and then an other cache routine change the number of children in the cache without first loading the correct number of children from disk.
If this looks correct to you, what code should I add to fix it? It would be better if one of you who knows the code better than me could do that. I can test and see if it works.
I hope this is the only place in cache handling of an entries children, though maybe someone with better knowledge on the code can identify others.
Hope this is the bug as I have used many days to trace it down and need to do some normal work for my company.
Regards,
Dan