Re: test008 failed in HEAD

4 Dec 2007


      Howard Chu wrote:
...
Howard Chu wrote:
...
Howard Chu wrote:
...
Hallvard B Furuseth wrote:
...
http://folk.uio.no/hbf/test008out2.tbz
ldapsearch.flt = ldif.flt + cn=James A Jones 4,ou=People,dc=example,dc=com
.
I just got a similar failure in RE24. Looks like the same thing Aaron
reported. Too bad it takes so long to reproduce it.
...
Very strange. The entry could not be found because cache_find_ndn returned an
EntryInfo for the previous incarnation of the entry, which contained the old
entryID. This is extremely odd given that bdb_cache_delete_internal has
already successfully deleted the old record from the cache, before the
previous Delete completed. On the subsequent Add, bdb_cache_add fails to
insert the new EntryInfo because it has the same RDN as the old.
Stranger and stranger. It looks like BerkeleyDB's transaction locking isn't 
working. The problem is occurring because BDB is returning the info for an 
entryID that is being deleted, but whose Delete transaction hasn't yet committed.
Here's a sequence of events from slapd.1.log:
conn=5 op=66 DEL dn="cn=James A Jones 2,ou=Alumni 
Association,ou=People,dc=example,dc=com"
bdb_dn2entry("cn=james a jones 2,ou=alumni 
association,ou=people,dc=example,dc=com")
==> bdb_delete: cn=James A Jones 2,ou=Alumni 
Association,ou=People,dc=example,dc=com
bdb_dn2entry("cn=james a jones 2,ou=alumni 
association,ou=people,dc=example,dc=com")
=> bdb_dn2id_delete( "cn=james a jones 2,ou=alumni 
association,ou=people,dc=example,dc=com", 0x0000006b )
...
bdb_idl_delete_key: 6b @cn=james a jones 2,ou=alumni 
association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b %ou=alumni association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b @ou=alumni association,ou=people,dc=example,dc=com
<= bdb_index_read: failed (-30990)
bdb_idl_delete_key: 6b @ou=people,dc=example,dc=com
<= bdb_dn2id_delete 0
=> index_entry_del( 107, "cn=James A Jones 2,ou=Alumni 
Association,ou=People,dc=example,dc=com" )
<= index_entry_del( 107, "cn=James A Jones 2,ou=Alumni 
Association,ou=People,dc=example,dc=com" ) success
====> bdb_cache_delete( 107 )
...
bdb_idl_fetch_key: [b49d1940]
<= bdb_substring_candidates: 3, first=12, last=107
bdb_search_candidates: id=3 first=12 last=107
=> send_search_entry: conn 28 dn="cn=James A Jones 1,ou=Alumni 
Association,ou=People,dc=example,dc=com"
ber_flush2: 146 bytes to sd 20
<= send_search_entry: conn 28 exit.
=> send_search_entry: conn 28 dn="cn=James A Jones 2,ou=Information Technology 
Division,ou=People,dc=example,dc=com"
ber_flush2: 157 bytes to sd 20
<= send_search_entry: conn 28 exit.
add_internal "cn=james a jones 2,ou=alumni 
association,ou=people,dc=example,dc=com", 107
...
bdb_delete: deleted id=0000006b dn="cn=James A Jones 2,ou=Alumni 
Association,ou=People,dc=example,dc=com"
send_ldap_result: conn=5 op=66 p=3
The time between the "bdb_cache_delete(107)" and the "bdb_delete: deleted..." 
is all the txn_commit time. In between there you see that an IDL search 
returned a result including entryID 107, even though all of the index IDLs for 
that entry were already deleted. This should never happen; all of the relevant 
IDL and dn2id pages should still be writelocked. There's also no Deadlocks 
within this interval of time, so nothing aborted midstream.
I'm seeing this failure with both BDB 4.2.52 and BDB 4.6.21, and no failure in 
RE23, so it must be something changed between RE23 and RE24, but so far 
looking at the diffs I don't see anything that could have caused this change 
in behavior.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: test008 failed in HEAD