Howard Chu wrote:
Howard Chu wrote:
> Howard Chu wrote:
>> Hallvard B Furuseth wrote:
>>>
http://folk.uio.no/hbf/test008out2.tbz
>>> ldapsearch.flt = ldif.flt + cn=James A Jones 4,ou=People,dc=example,dc=com
>>> .
>>>
>> I just got a similar failure in RE24. Looks like the same thing Aaron
>> reported. Too bad it takes so long to reproduce it.
Very strange. The entry could not be found because cache_find_ndn
returned an
EntryInfo for the previous incarnation of the entry, which contained the old
entryID. This is extremely odd given that bdb_cache_delete_internal has
already successfully deleted the old record from the cache, before the
previous Delete completed. On the subsequent Add, bdb_cache_add fails to
insert the new EntryInfo because it has the same RDN as the old.
Stranger and stranger. It looks like BerkeleyDB's transaction locking isn't
working. The problem is occurring because BDB is returning the info for an
entryID that is being deleted, but whose Delete transaction hasn't yet committed.
Here's a sequence of events from slapd.1.log:
conn=5 op=66 DEL dn="cn=James A Jones 2,ou=Alumni
Association,ou=People,dc=example,dc=com"
bdb_dn2entry("cn=james a jones 2,ou=alumni
association,ou=people,dc=example,dc=com")
==> bdb_delete: cn=James A Jones 2,ou=Alumni
Association,ou=People,dc=example,dc=com
bdb_dn2entry("cn=james a jones 2,ou=alumni
association,ou=people,dc=example,dc=com")
=> bdb_dn2id_delete( "cn=james a jones 2,ou=alumni
association,ou=people,dc=example,dc=com", 0x0000006b )
...
bdb_idl_delete_key: 6b @cn=james a jones 2,ou=alumni
association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b %ou=alumni association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b @ou=alumni association,ou=people,dc=example,dc=com
<= bdb_index_read: failed (-30990)
bdb_idl_delete_key: 6b @ou=people,dc=example,dc=com
<= bdb_dn2id_delete 0
=> index_entry_del( 107, "cn=James A Jones 2,ou=Alumni
Association,ou=People,dc=example,dc=com" )
<= index_entry_del( 107, "cn=James A Jones 2,ou=Alumni
Association,ou=People,dc=example,dc=com" ) success
====> bdb_cache_delete( 107 )
...
bdb_idl_fetch_key: [b49d1940]
<= bdb_substring_candidates: 3, first=12, last=107
bdb_search_candidates: id=3 first=12 last=107
=> send_search_entry: conn 28 dn="cn=James A Jones 1,ou=Alumni
Association,ou=People,dc=example,dc=com"
ber_flush2: 146 bytes to sd 20
<= send_search_entry: conn 28 exit.
=> send_search_entry: conn 28 dn="cn=James A Jones 2,ou=Information Technology
Division,ou=People,dc=example,dc=com"
ber_flush2: 157 bytes to sd 20
<= send_search_entry: conn 28 exit.
add_internal "cn=james a jones 2,ou=alumni
association,ou=people,dc=example,dc=com", 107
...
bdb_delete: deleted id=0000006b dn="cn=James A Jones 2,ou=Alumni
Association,ou=People,dc=example,dc=com"
send_ldap_result: conn=5 op=66 p=3
The time between the "bdb_cache_delete(107)" and the "bdb_delete:
deleted..."
is all the txn_commit time. In between there you see that an IDL search
returned a result including entryID 107, even though all of the index IDLs for
that entry were already deleted. This should never happen; all of the relevant
IDL and dn2id pages should still be writelocked. There's also no Deadlocks
within this interval of time, so nothing aborted midstream.
I'm seeing this failure with both BDB 4.2.52 and BDB 4.6.21, and no failure in
RE23, so it must be something changed between RE23 and RE24, but so far
looking at the diffs I don't see anything that could have caused this change
in behavior.
--
-- Howard Chu
Chief Architect, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/