aweits@rit.edu wrote:
Full_Name: Andrew Elble Version: 2.4.24 / CVS Head OS: Solaris / MacOS URL: Submission from: (NULL) (129.21.6.207)
Thanks for the detailed report.
We are using sortvals on both member and memberUid. We have been seeing duplicate member/memberUid attributes on some objects that have been modified (as well as a lack of sorting on those attributes). It seemed that there was a correlation between modifies to objects that experienced deadlocks and the objects that had duplicate member/memberUid attributes on them. We put the seqmod overlay in place - and this reduced the number of occurrences of the issue but did not eliminate them.
Upon further investigation, I discovered that it was possible to bypass the sorting behavior if the object was not created with an instance of the attribute with sorting enabled as a part of it.
It would seem that attr_merge() (in attr.c) should have something like this:
if ( *a == NULL ) { *a = attr_alloc( desc ); if (desc->ad_type->sat_flags& SLAP_AT_SORTED_VAL) { (*a)->a_flags |= SLAP_ATTR_SORTED_VALS; } } else {
This is now fixed in HEAD in value.c.
Further pursuing the issue, I started to focus on the index deletion code that was changed as a part of ITS#5183. Specifically, the portion of code within bdb_modify_internal() (in back-bdb/modify.c) that is commented:
/* Move deleted values to end of array */
This code modifies save_attrs, which is actually apparently a pointer to memory that resides within the cache. If a deadlock occurs, these changes are not reverted, thereby corrupting the entry in the cache. I replaced this code with the pre-ITS#5183 code and I am no longer able to 'break' the object and insert duplicate member/memberUids.
This is now fixed in HEAD.
I also found it surprising that the call to bdb_idl_cache_del() in bdb_idl_delete_key() in back-bdb/idl.c occurred prior to any calls to the database?
I see what you mean. I don't think this causes any harm though.
I can answer any questions about the specifics of the environment in which where we are seeing this - it is a somewhat difficult problem to reproduce outside of our production environment. I'm not terribly familiar with the code - I'm looking to see if I have collected enough data here to open an ITS to have this fixed. (or if I'm just way off base)
Thanks,
Andy