Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Hallvard B Furuseth wrote:
http://folk.uio.no/hbf/test008out2.tbz ldapsearch.flt = ldif.flt + cn=James A Jones 4,ou=People,dc=example,dc=com .
I just got a similar failure in RE24. Looks like the same thing Aaron reported. Too bad it takes so long to reproduce it.
Very strange. The entry could not be found because cache_find_ndn returned an EntryInfo for the previous incarnation of the entry, which contained the old entryID. This is extremely odd given that bdb_cache_delete_internal has already successfully deleted the old record from the cache, before the previous Delete completed. On the subsequent Add, bdb_cache_add fails to insert the new EntryInfo because it has the same RDN as the old.
Stranger and stranger. It looks like BerkeleyDB's transaction locking isn't working. The problem is occurring because BDB is returning the info for an entryID that is being deleted, but whose Delete transaction hasn't yet committed.
Here's a sequence of events from slapd.1.log:
conn=5 op=66 DEL dn="cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" bdb_dn2entry("cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com") ==> bdb_delete: cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com bdb_dn2entry("cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com") => bdb_dn2id_delete( "cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com", 0x0000006b ) ... bdb_idl_delete_key: 6b @cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com bdb_idl_delete_key: 6b %ou=alumni association,ou=people,dc=example,dc=com bdb_idl_delete_key: 6b @ou=alumni association,ou=people,dc=example,dc=com <= bdb_index_read: failed (-30990) bdb_idl_delete_key: 6b @ou=people,dc=example,dc=com <= bdb_dn2id_delete 0 => index_entry_del( 107, "cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" ) <= index_entry_del( 107, "cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" ) success ====> bdb_cache_delete( 107 ) ...
bdb_idl_fetch_key: [b49d1940] <= bdb_substring_candidates: 3, first=12, last=107 bdb_search_candidates: id=3 first=12 last=107 => send_search_entry: conn 28 dn="cn=James A Jones 1,ou=Alumni Association,ou=People,dc=example,dc=com" ber_flush2: 146 bytes to sd 20 <= send_search_entry: conn 28 exit. => send_search_entry: conn 28 dn="cn=James A Jones 2,ou=Information Technology Division,ou=People,dc=example,dc=com" ber_flush2: 157 bytes to sd 20 <= send_search_entry: conn 28 exit. add_internal "cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com", 107 ...
bdb_delete: deleted id=0000006b dn="cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" send_ldap_result: conn=5 op=66 p=3
The time between the "bdb_cache_delete(107)" and the "bdb_delete: deleted..." is all the txn_commit time. In between there you see that an IDL search returned a result including entryID 107, even though all of the index IDLs for that entry were already deleted. This should never happen; all of the relevant IDL and dn2id pages should still be writelocked. There's also no Deadlocks within this interval of time, so nothing aborted midstream.
I'm seeing this failure with both BDB 4.2.52 and BDB 4.6.21, and no failure in RE23, so it must be something changed between RE23 and RE24, but so far looking at the diffs I don't see anything that could have caused this change in behavior.