Re: (ITS#8493) Under heavy modrdn load, masters desync - openldap-bugs

3 Sep 2016


      --On Saturday, September 03, 2016 4:51 PM +0000 quanah@zimbra.com wrote:
...
--On Saturday, September 03, 2016 6:15 AM +0000 quanah@openldap.org wrote:
...
Full_Name: Quanah Gibson-Mount
Version: 2.4.44+ITS8432
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.111.52.177)
Trying to reproduce another ITS, I discovered a new bug.  When doing
MODRDN ops on one master, the other master keeps going out of sync.
Specifically:
Sep  3 01:12:17 zre-ldap002 slapd[29206]: syncrepl_message_to_op: rid=100
be_modrdn uid=user.924,ou=people,dc=zre-ldap002,dc=eng,dc=zimbra,dc=com
(32) Sep  3 01:12:17 zre-ldap002 slapd[29206]: do_syncrep2: rid=100
delta-sync lost sync on (reqStart=20160903051215.747829Z,cn=accesslog),
switching to REFRESH
Note that this master also has a replica.  The replica never rejected a
single one of these MODRDNs coming from this master.  Which means that
either:
a) The data on the master spontaneously corrupted at some point
or
b) The master wrote the MODRDNs to the accesslog, which the replica
picked  up, but did not itself make the MODRDN changes to its database.
In the end, of the 50,000 MODRDNs it was processing, it threw an error 32
for 441 of them.
After the master that was not accepting direct writes re-sync'd with the 
master accepting writes, it still had 403/50000 entries wrong.  So did its 
replica.  So the master isn't writing the changes to the accesslog.  So 
it's option c.  The master rejects a valid op, never sync's correctly, and 
in the end 2/3rds of my servers have invalid databases.
I see zero indication that using a sessionlog works around 
http://www.openldap.org/its/index.cgi/?findid=8125 at all.  I still end 
up with missed entries even with everything *in* the sessionlog.
--Quanah
--
Quanah Gibson-Mount