Full_Name: Barry Colston Version: 2.4.17 OS: Fedora 10 URL: Submission from: (NULL) (209.255.208.219)
While testing sync replication, I encountered a situation in which a previously deleted DN cannot be added again because the ldapadd command receives a 68 error code. If I perform an ldapsearch command for the DN, the DN is not found. If I try to add the DN, the add fails with a 68 error code. If I perform a slapcat command for that DN, slapcat displays the record.
I have 3 multi-master servers in my configuration; all 3 are executing on the same physical server, listening on different ports and with separate copies of BDB databases. Each server is replicating to the other 2 servers (e.g., server 1 replicates to server 2 and server 3, server 2 replicates to server 1 and server 3, and server 3 replicates to server 1 and server 2) using the refreshAndPersist mode. I execute 3 shell scripts simultaneously, each of which adds a set of parent/child records using the ldapadd command and ldif files, then deletes the records using the ldapdelete command (each shell scripts adds and deletes about 267 records and each shell script operates on a separate set of DNs). The 3 shell scripts issue the ldapadd and ldapdelete commands against server 1 and repeat the add/delete cycle 10 times before exiting. After all 3 shell scripts finish, I compare the server 1 records against the server 2 records and compare the server 1 records against the server 3 records listing any differences. The method I normally execute the shell scripts results in 800 records being added then deleted 10 times, for a total of 8000 adds/deletes occurring.
After executing the above 3 shell scripts multiple times (without bringing down slapd between executes), some records will fail to be added by the ldapadd command because the ldapadd command returns an error code of 68. After this occurs, I execute an ldapsearch command for the DN of the record that received the 68 error; the ldapsearch command fails to find the record (which is correct because the record was deleted). If I perform a slapcat command for the DN against server 1's BDB files, slapcat finds the record, but it is listed with an objectClass and structuralObjectClass of "glue" (which are different than when the record was added.) Slapcat performed against the 2 other master servers (server 2 and server 3) do not display a record. When this error occurs, there are usually multiple records that fail to add with an error code of 68.
I have removed all BDB index files and rerun the slapindex command, but the DN is still not found with the ldapsearch command and fails to be added because of a 68 error code.
This condition is not repeatable on demand, but if I run my scripts doing ldapadd/ldapdelete multiple times, it will eventually occur.
This error appears to be related to the value specified in my slapd.conf file for "syncprov-sessionlog". I have changed the value of "syncprov-sessionlog", with the following results:
syncprov-sessionlog 5000 - usually execution 3 or 4 results in the ldapadd 68 error (e.g., the first 2 executions of 8000 adds/deletes work OK)
syncprov-sessionlog 50000 - usually execution 6 or 7 results in the ldapadd 68 error (e.g., the first 5 executions of 8000 adds/deletes work OK)
syncprov-sessionlog 200 - usually execution 2 or 3 results in the ldapadd 68 error (e.g., the first execution of 8000 adds/deletes works OK)
syncprov-sessionlog not specified - usually execution 2 or 3 results in 1 of the slapd servers crashing with a segmentation fault (usually server 1, but sometimes the other servers) (example of crash output is *** glibc detected *** /tmp/reptest/openldap/openldap-2.4.17/libexec/slapd: malloc(): memory corruption (fast): 0x9e61f860 ***)
I am using BDB 4.6.21 and have tested with the 4 BDB patches applied and not applied (the 68 error occurs using BDB without the patches and BDB with the patches).