(ITS#6268) multi-master sync replication ldap_add error code 68 bug - openldap-bugs

20 Aug 2009


      Full_Name: Barry Colston
Version: 2.4.17
OS: Fedora 10
URL: 
Submission from: (NULL) (209.255.208.219)
While testing sync replication, I encountered a situation in which a previously
deleted DN cannot be added again because the ldapadd command receives a 68 error
code.  If I perform an ldapsearch command for the DN, the DN is not found.  If I
try to add the DN, the add fails with a 68 error code.  If I perform a slapcat
command for that DN, slapcat displays the record.
I have 3 multi-master servers in my configuration; all 3 are executing on the
same physical server, listening on different ports and with separate copies of
BDB databases. Each server is replicating to the other 2 servers (e.g., server 1
replicates to server 2 and server 3, server 2 replicates to server 1 and server
3,
and server 3 replicates to server 1 and server 2) using the refreshAndPersist
mode. I execute 3 shell scripts simultaneously, each of which adds a set of
parent/child records using the ldapadd command and ldif files, then deletes the
records using the ldapdelete command (each shell scripts adds and deletes about
267 records and each shell script operates on a separate set of DNs). The 3
shell scripts issue the ldapadd and ldapdelete commands against server 1 and
repeat the add/delete cycle 10 times before exiting. After all 3 shell scripts
finish, I compare the server 1 records against the server 2 records and compare
the server 1 records against the server 3 records listing any differences.  The
method I normally execute the shell scripts results in 800 records being added
then deleted 10 times, for a total of 8000 adds/deletes occurring.
After executing the above 3 shell scripts multiple times (without bringing down
slapd between executes), some records will fail to be added by the ldapadd
command because the ldapadd command returns an error code of 68.  After this
occurs, I execute an ldapsearch command for the DN of the record that received
the 68 error; the ldapsearch command fails to find the record (which is correct
because the record was deleted).  If I perform a slapcat command for the DN
against server 1's BDB files, slapcat finds the record, but it is listed with an
objectClass and structuralObjectClass of "glue" (which are different than when
the record was added.) Slapcat performed against the 2 other master servers
(server 2 and server 3) do not display a record.  When this error occurs, there
are usually multiple records that fail to add with an error code of 68.
I have removed all BDB index files and rerun the slapindex command, but the DN
is still not found with the ldapsearch command and fails to be added because of
a 68 error code.
This condition is not repeatable on demand, but if I run my scripts doing
ldapadd/ldapdelete multiple times, it will eventually occur.
This error appears to be related to the value specified in my slapd.conf file
for "syncprov-sessionlog".  I have changed the value of "syncprov-sessionlog",
with the following results:
syncprov-sessionlog 5000 - usually execution 3 or 4 results in the ldapadd 68
error (e.g., the first 2 executions of 8000 adds/deletes work OK)
syncprov-sessionlog 50000 - usually execution 6 or 7 results in the ldapadd 68
error (e.g., the first 5 executions of 8000 adds/deletes work OK)
syncprov-sessionlog 200 - usually execution 2 or 3 results in the ldapadd 68
error (e.g., the first execution of 8000 adds/deletes works OK)
syncprov-sessionlog not specified - usually execution 2 or 3 results in 1 of the
slapd servers crashing with a segmentation fault
    (usually server 1, but sometimes the other servers)
    (example of crash output is *** glibc detected ***
/tmp/reptest/openldap/openldap-2.4.17/libexec/slapd: malloc(): memory corruption
(fast): 0x9e61f860 ***)
I am using BDB 4.6.21 and have tested with the 4 BDB patches applied and not
applied (the 68 error occurs using BDB without the patches and BDB with the
patches).